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THE PURPOSE OF THIS INVESTIGATION WAS TO DETERMINE 
WHETHER THE PRELIMINARY SCHOLASTIC APTITUDE TEST PRESENTED A 
DIFFERENTIAL DIFFICULTY FOR RACIAL AND SOCIOECONOMIC GROUPS. 
THE SUBJECTS WERE TWO GROUPS TOTALING 1,410 NEGRO AND WHITE 
HIGH SCHOOL SENIORS IN AN INTEGRATED HIGH SCHOOL WHO HAD 
TAKEN THE TEST. THEY WERE DIVIDED INTO THREE SOCIOECONOMIC 
LEVELS ON THE BASES OF FATHER'S OCCUPATION, FATHER'S AND 
MOTHER'S EDUCATION, AND A SPECIAL INDEX (HOUSE-HOME). A 
THREE-FACTOR ANALYSIS OF VARIANCE DESIGN (RACE, SOCIOECONOMIC 
STATUS, AND ITEMS ON THE MATHEMATICAL AND VERBAL SECTIONS OF 
THE EXAMINATION) WAS USED TO INTERPRET THE RESULTS. THE 
AUTHORS FOUND THAT FEW ITEMS PRODUCED AN UNCOMMON DISCREPANCY 
BETWEEN THE PERFORMANCE OF NEGRO AND WHITE STUDENTS' AND THAT, 
IF THE TEST SCORES WERE DISCRIMINATORY, THE DISCRIMINATION 
WAS A RESULT OF PARTICULAR ITEMS ON THE TEST RATHER THAN OF 
THE TEST AS A WHOLE. (NH) 
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Abstract 

For this research, bias was defined as an item x group interaction. 
G-rade 12 students in integrated high schools who had taken two foms of 
the PSAT were divided into two races and three socioeconomic levels 
within the races. Four analyses of variance were perfOiTned: one for the 

Verbal section and one for the Mathematical section of each of the two 
forms of the PSAT. Because of the large sample sizes used, the tested 
effects were expected to be and were significant. However, only a minimal 
percentage of the total variance was contributed by the item x group 
interactions. On this basis, it was concluded that, if PSAT scores are 
discriminatory, the discrimination is not largely attributable to 
particular items, but to the test as a whole. 
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An Investigation of Item Bias 



Abstract 



For this research, bias was defined as an item x group interaction, ' 
Grade 12 students in integrated high schools who had taken two foms of 
the PSAT were divided into two races and three socioeconomic levels 
within the races. Four analyses of variance were perfomed: one for the 

Verbal section and one for the Mathematical section of each of the two 
forms of the PSAT. Because of the large sample sizes used, the tested 
effects were expected to be and were significant. However, only a m:lnimal 
percentage of the total variance was contributed by the item x group 
interactions. On this basis, it was concluded that, if PSAT scores ars * 
discriminatory, the discrimination is not largely attributable to 
particular items, but to the test as a whole. 
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I An Investigation of Item Bias 

! T. Anne Cleary and Thomas L. Hilton 

X 

t 

, As the scope of educational testing has increased^ there has been a 

r 

, pmitant increase in concern about the applicability of widely used tests 
jifferent cultural groups. The College Entrance Examination Board, for 
pie, has become concerned about the appropriateness of the Scholastic 
jtude Test (sAT) and the Preliminary Scholastic Aptitude Test (psAT) for 
I subgroups of the population, particularly Negro Americans . 

i 

I It iB Often difficult to determine what is meant by the word "bias" when 
S used in reference to teste. Test "bias" is explored here in terms of 

. , ^ t ' 

(ridual test items. An item of a test is said to be biased for members of » 
Hicular group if, on that item, the members of the group obtain an 
'age score which differs from the average score of other groups by more 
pss than expected from performance on other items of the same test, 
f is, the biased item produces an uncommon discrepancy between the 
.ormance of members of the group and members of other groups. In terms 
^he analysis of variance, bias is defined as an item x group interaction, 
re can be no connotation of "unfair" associated with this definition of 
The mean of the particular group may be higher or lower than expected. 

. Previous research has indicated that there are few, if any, aets of 
tJs in the SAT which show unusual discrepancies between the performance of ’ 
ro and white students. Roberts (1962) did an item analysis of a I961 
» ol the SAT administered to a sample of Fisk University freshmen and 

his results with those of a College Board national sample* For the 
^ analysis, Roberts used the upper and lower 50 cases from a sample of 
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15 students. On tiie average, the SAT items were more difficult for Fisk 
udon'ts ijhs/ii *tli6 n8/*tiojQ8/l ssjnpXs^ iDU't ijlisrs VS/S" no 6Vid6nc6 of Toias in 
jrticular items. The deltas^ for the Fisk sample tended to he about three 
jints higher than those .for the College Board sample, but the variance of 
je differences was approximately what had been observed in other nonrandom 
piples. The Roberts study also indicated that timing was not an important 
ctor in lowering the Fisk scores; The later items in the test did not differ 
ticeably from the earlier items in degree^ of discrimination or difficulty. ’ 

I Cardall and Colftian (1964) applied the analysis of variance design for 
jo-factor experiments with repeated measures on one factor to the problem 
[ item bias. In their suggested use of the design, several random samples 
p drawn from each of the groups being compared in order to allow the 
iriance within groups to be used as an estimate of error. This design 
lows the testing of two hypotheses which are of interest in the study of 
pm bias. First, are there significant group main effects, that is, do 
9 groups differ significantly in mean scores? SecondiJ is there a signifi- 
pt interaction between items and groups, that is, are selected items 

jlatively easier for one group than for another? If there is no signifi- 

} 

,nt interaction, one may oonolufle that the test is homogeneous across groups, 

f 

|d that, if a difference in item difficulty exists, it is present equally 
J all items. 

t 

j From the May 1963 administration of the SAT, Cardall and Coffman drew 

^e samples of 3OO oases from each of three groups: Group 1 answer sheets 

^re selected from rural centers in the midwest, group 2 answer sheets were 

fleeted from centers in Kew York City, and group 3 answer sheets were 

elected from centers in the southeast wh ;re only JJegro candidates were 
|>6lstered. 
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; Two analyses of variance were performed,, one for the 40 verbal items in 
Jection II of the test^ and the other for 25 mathematical items in Section 
tll. For both the verbal and mathematical items, Cardall and Coffman found 
Significant group main effects. The major differences in both cases were 
j)etween groups 1 and 3 n-nd groups 2 and 3! the mean performance of group 3 ' 
j/as lower in each case , 

^ The interaction between groups and items was also highly significant in' 
both analyses. Since there were three samples in each of the three groups, 
pardall and Coffman were able to compute independent correlations of item 

^ ^ ]P e "Wee n groups. Two of the samples within 

Sach group were used’ to find the within-group correlations between item 
difficulties; the third sample in each group was used to find the correlations 
^etween the item difficulties of the different groups. The within-group 
prrelations indicated the degree to which the item difficulties varied in 
(ifferent samples from the same group. The between-group correlations 
Indicated the degree' to which the relative difficulty ^f the items remained 
pnstant from one group to the next. 

I For both verbal and mathematical items, the within-group correlations 
f 

pere very high (between .96 and .99). For the verbal items, the between- 
poup correlations involving group 3 were much lower than the within-group 
.correlations. Thus, a major factor in determining the significance of the 
interaction for verbal items appeared to be the lack of correspondence 
|iet«een the relative aiffloultles of the items for group 3 and the other two . ' 

t 

^eroups. Since the three hetween-group correlations for the mathematical 

were similar, it appeared that the factors accounting for the signifi- 

1 

;csnt interaction were evenly distributed across the three groups. 

i 

t 

j 

i 
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j The Cardall and Coffman analysis indicated that the SAT items did not 
; retain the same relative difficulty across groups, but the analysis did not 
; indicate vhether the discrepancies ■were all in one direction or balanced so 

i 

i 

i o 

j that one group was not favored over another. Fremer*^ continued the Cardall 
I and Coffman study by plotting the arcsin transformations of the item diffi- 

■ f 

f 

I- » 

I culties for each pair of groups,, Fremer found that two items with a 
distinctly rural flavor were very much easier for group 1 than for the other 

J 

j two groups. Thus it might be said that the test has a slight rural bias. 

i 

r 

! When group 3 was compared with each of the other groups, a slight curvi- 
; linearity was found in the plots, but this appeared to be due to a "floor" 

j 

’> effect in the group 3 responses. This curvilinearity would have atteniiated 

[the correlations between the item difficulties of group 3 and the other 

» 

: groups. 



Purpose 

i 

1 

i The purpose of this research was to study the variation of Preliminary 

I . ' 

j Scholastic Aptitude Test (PSAT) item scores in different racial and socio- 
■ economic (SES) groups;.- The questions asked were whether the test items are 
j equally difficult for all groups, whether the group mean scores across items 
I differ by groups, or whether both group means and relative scores on 
: individual items change as a function of race, SES within race, or both. 

, Although the primary question at hand was the possibil;' '' y of differential 
difficulty for racial groups, the inclusion of SES as a factor in the 
research made it possible to study this variation as well as the variation 
associated with race . 
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1 Procedure . . 

j ~ 

I SainEle. Every, two years, as a part of a longitudinal study of academio 

growth, a large sample of twelfth-grade students Is given the PSAT. For this 
I research, seven Integrated schools in three large metropolitan centers were 
I selected from the larger sample, and the race of the I96I (Group l) and I963 
I (Group II) twelfth-graders was identified by the school administrators. In 
. order to have an equal number of students of each race, it was necessary to 
I use all available Negro students and randomly sample the white students . For 
' the analyses. Group I had 636 students; Group II, 774. 

I Group I took form IPTl of the PSAT,; Group II took fonii KPT ' 

^ (a paraUel form). The five -option items were scored by giving one point for 
, a correct response, zero for no response, and minus one-quarter for an 

; incorrect response . This scoring method was based on the formula used to 
I Obtain the total test score. 

I SES was defined by Information from a background and experience 

I questionnaire which was completed by Groups I and il<ifln 1963. An SES score 

I was Obtained from questions on father's occupation, mother's and father-'s 

I education, and the House-Home Index (Kerr 8= Remmers, 1942). Students within 

I each race were then ordered from high to low and divided into three equal 

I SK groups: high, middle, and low SIB. In all cases, the cutting scores ' 

I for the SES levels were lower fdr the Negro SES groups than for the white 
I groups. , 

I Me thod of analysis . A. three -factor analysis of variance design was 
I 'od. figure 1 gives a schematic representation of the design. The first 
: factor was race, which, was considered a fixed factor, A second factor was 
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SES,, which was considered fixed and nested within race. The nesting o'i* SES 
within race avoided the assumption that the SES levels are comparable in the 
two races. A third factor was items, which was considered random. 

The linear model foi; the design had the form: 

® Vre) = + Rp + S^(r) + 

where 

= the score of the person of the r**^^ race and s^^ 
socioeconomic level on the item, 

= the grand mean, 



X . 
pirs 



R. 



s 



(r) 



s= effect of the r^^ race,, r *= 1, ^ , 

^ 

th 

s= effect of the s socioeconomic level within the r^^ race, 
s = 1 , . . , , ]\r^ , 

- effect of the i”*^^ item, i = 1, 



^p(rs) 



IS 



« effect of the p”*^^ person, p = 1, N within the r^^ race 

"fch. ^ 

and s socioeconomic level. 



is(rs)“ taction of items and socioeconomic level within race, 

^^ir ^ interaction of items and races, 

^^ip(rs)“ i’^'t® 3 ^action of items and persons within race and socioeconomic 

level. 

Table 1 gives the suimnary of the analysis of variance design. The 
txpected mean squares were derived by the method of Cornfield and Tukey (1956) 

^ I this summary, race and socioeconomic level are considered fixed effects, 

‘«e items and persons random. 

Two analyses were performed for each group: one for the 70 items of the 

'•rtsl section of the PSAT, and one for the 50 items of the mathematical 
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section. The analysis of variance design allowed the testing of five 
hypotheses: '■\, 

(1) P"jj^ = 0» That there is no interaction between items and race. 

This is the most important hypothesis. As bias has been defined here for 

items, an interaction between items and race would indicate presence of 

racial bias, and that the pattern of item difficulties is different in the 

two racial groups. 

2 

(2 ) ^IS(r) ” there is no interaction between items and SES 

within race. The presence of item-SES interaction would indicate that the 
items are biased for different SES groups within at least one of the two 
races. 

2 

(3) CTj = 0. That there is no difference in the mean scores of dif- 
ferent items. It was expected that this hypothesis would be rejected at a 
very high level of significance because it was known that the items do differ 
considerably in difficulty. ' 

W °s(r) ^ is no differerjce ^n mean item scores in 

different SES levels within race. It was expected that this hypothesis 
vould be rejected, and that the higher SES levels would have higher mean 
scores. This hypothesis, must be tested by a quasi P ratio (Satterthwaite, 
IW. 

(5) CTj^ = 0. That there is no difference in mean item' scores for the 
two races. This hypothesis must be tested by a quasi P ratio. 

He suits 

Tables 2 and 3 contain summaries of the analyses of variance for Groups 

I • 

. i and Jl, With 106 and 129 persons in each of the six Race x SES groups, 



- 

even rather small^ inconsequential differences may be significant. As was 

expected, almost all tested efi’ects were found to be significant. 

A more meaningful way of looking at the results of the analyses of 

variance with such large sample si^es is to estimate the variance components 

and the percentage contribution of each effect to the total variance of a 

single observation chosen at random. These figures are given in Table 4. 

It should be remembered that a single observation is an item score which may 

have the values one, zero, or minus one -quarter, in all four analyses, a 

major percentage of the variance was contributed by the Subject x Item 

interaction. The Subject x Item interaction is treated by Hoyt (l94l) as 

the variance due to error of measurement. By subtracting from one the 

proportion of variance due to Subject x Item interaction, an estimate of 

the reliability of an item can be obtained. Large percentages of the total 

variance were contributed by the effect of Persons Within Race-SES Groups 

and the effect of Items. The smallest contributions to the total variance 

were provided by the Item x Race interaction (the ’indicator of racial bias) 

1 

and Item x SES Within Race interaction (the indicator of social class bias). 
Given the stated definition of bias (an item x group interaction), the PSAT 

cannot for practical purposes be considered biased for either race or SES 
within race.. 

The lack of a practically significant amount of item x group interaction 
is made clear in Figures 2 through 5 which contain bivariate plots of the 
sums of item scores for Negro and white students. From these plots, it can 
ce seen that the items are on the average easier for the white students; 
the concentration of points is below the 45 degree line . However, there 
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appears to be no systematic deviation from a straight line except possibly 
that caused by what appears to be "floor" effect in the Negro scores. The ‘ 
"floor effect, indicated by the decreased slope at the left of the plots, 
would contribute to the item x race interaction. ' • ’ 

Discussion 

For this research, bias was defined as an item x group interaction. In 
four separate analyses, the Item x Race and Item x SES Within Race 
interactions contributed minimal ^percentages of the total varia;nc;e of an 
observation. From the bivariate plots of sums of item scores, it was 
apparent that there were few items producing an uncommon discrepancy between 
the performance of Negro and white students. It must therefore be concluded » 
that, given the stated definition of bias, the PSAT for practical purposes 
is not biased for the groups studied, The question of bias as a total 
test score difference between groups has not been considered here. A 
second phase of this research, now in progress, is designed to investigate 
total test score differences and the way in which these differences affect 
, the predictive validity of the SAT. 
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Footnotes 



^he authors are grateful to William H, .Angoff, William E. Coffman,- and 
Julian C. Stanley for numerous suggestions and helpful criticisms. 

^elta is defined as the normal deviate , expressed in terms of a scale 
*.lth mean of 13 and standard deviation of 4 , vhich corresponds to the 
proportion of candidates reaching the item who answer it correctly. A low 
delta describes an easy item; ajhigh delta, a difficult one • 

%remer, Jv Manuscript in preparation, I966. 
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Table 2 

' ^ 

Summary of the Group I Analyses of Variance 

PSAT Verbal 



Betveen Subjects 
Race 

SES Within Race 
Betveen Subjects Nested 
Within Race -SES Groups 

Vlthin Subjects 
Items 

Items X Race 

Items X SES Within Race 
Interaction of Subjects 
vith Items Within Race- 
SES Groups 



Quasi P ratios 



Source 


df 


SS 


MS 


P 




Betveen Subjects 


635 


1,608.37 




' 




Race 

SES Within Race 
Betveen Subjects Nested 
Within Race -SES Groups 


1 

630 


208.64 

131.09 

1,268.64 


208.64 

32.77 

2.01 


103.3* 

15.7* 




Within Subjects 


k- 3 , 884 


13,813.62 








Items 

Items X Race 
Items X SES Within Race 
Interaction of Subjects 
vith Items Within Race- 
SES Groups 


69 

69 

276 

43,470 


2,030.76 

89.53 

95.75 

11.597.58 


29.43 

1,30 

.35 

.27 


. 110.3 

4.9 

1.3 




Total 


519 


15,421.99 










PSAT Math 

• - ' ^ 








Source 


df 


SS 


MS 


P 





635 


1,818.69 






1 


296.77 


296.77 


132. 2X- 


4 


115.16 


28.79 


12.9* 


630 


1, 406.76 


2.23. 


• 


31,164 


8,433.96 






49 


990.40 


20.21 


85.5 


' 49 


102.97 


2.10 


8.8 


196 


47.04 


.24 


1.0 


30,870 


7,293.55 


.24 




31,799 


10,252.65 
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Table 3 

Summary of the Group II Analysis of Variance 



PSAT Verbal 



Source 


df 


SS 


MS 


F ’ 


Between Subjects 


'773 


1,831.48 






Race 

SES Within Race 
Between Subjects Nested 
Within Race -SES Groups 


1 

768 


382.61 

91.19 

1 , 357-68 


382.61 

22.80 

1.77 


215.0* 

12.8* 


V/ithin Subjects 


33,h06 


16,859.17 






Items 

Items X Race 
Items X SES Within Race 
Interaction of Subjects 
with Items Within Race- 
SES Groups 


69 

69 

276 

52,992 


2,334.69 

164.47 

94.88 

14,265.13 


33.84 

2.38 

.34 

.26 


i 

130.2 

9.2 

1.3 

• 


Total 


5 if,l 79 


18,690.65 






• 


PSAT Math 

'f ‘ ^ 






Source 


df 


SS 


MS 


F 


Between Subjects 


773 


1,431.31 






Race 

SES Within Race 
Between Subjects Nested 
Within Race -SES Groups 


1 
■ k 

768 


•221.72 

68.33 

i,i 4 i .26 


221.72 

17.08 

1.49 


147.2* 

11 . 4 * 


Within Subjects 


37,926 


10,999.68 






Items 

Items X Race 
Items X SES Within Race 
Interaction of Subjects 
with Items Within Race- 
SES Groups 


k-9 

h9 

196 

37,632 


2,082.63 

125.26 

61.89 

8,729.90 


42.50 

2.55 

.32 

.23 


184.8 

11.1 

1.4 


lotal 


38,699 


12,430.99 
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Quasi P ratios 



Table 4 



Estimated Components of Variance 
Group X 





Verbal 




Math 


Effect 


Est . or 


io of Total 


2 

Est . or 


1 

of Total 


Hace 


.00923 


2.6 


.01841 


. 5.5 


SES Within Race 


.00413 


1.2 


.00501 


1.5 


Persons Within Rhce-SES Groups 


.02495 


7.0 


.03993 


11.9 


Items 


.04586 


12.9 


.03141 


9.3 


Items X Race 


.00324. 


.9 


.00587 


1.7 


Items X SES Within Race 


.00076 


.2 


.00004 


0.0 * 


Interaction of Subjects 
with Items Within Race- 
SES Groups 


.26680 


75.2 


.23637 


70.1 



1 


Group II ■- 








Verbal 


Math 


Effect 


Est. cj 


io of Total 


2 

Est. cr 


io of Total 


Pace 


,.01398 


3.9 


.01126 


3.4 


ESS Within Race 


.00232 


.7 


.00240 


.7 


Persons Within Race -SES Groups 


.02141 


6.0 


.02508 


7.6 


Items 


.04337 


12.2 


.05461 


16.4 


Items X Race 


.00546 


1.5 


.00601 


1.8 


Items X SES Within Race 


.00058 


.2 


.00065 


.2 


Interaction of Subjects 










with Items Within Race- 








* 


SES Groups 


.26919 

1 


75.5 


.23198 


69.9 



r 
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Figure 1. Schematic repreeentation of the analyels of variance design. 
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